An anomaly detection method for identifying locations with abnormal behavior of temperature in school buildings

Time series data collected using wireless sensors, such as temperature and humidity, can provide insight into a building’s heating, ventilation, and air conditioning (HVAC) system. Anomalies of these sensor measurements can be used to identify locations of a building that are poorly designed or maintained. Resolving the anomalies present in these locations can improve the thermal comfort of occupants, as well as improve air quality and energy efficiency levels in that space. In this study, we developed a scoring method to identify sensors that shows collective anomalies due to environmental issues. This leads to identifying problematic locations within commercial and institutional buildings. The Dynamic Time Warping (DTW) based anomaly detection method was applied to identify collective anomalies. Then, a score for each sensor was obtained by taking the weighted sum of the number of anomalies, vertical distance to an anomaly point, and dynamic time-warping distance. The weights were optimized using a well-defined simulation study and applying the grid search algorithm. Finally, using a synthetic data set and the results of a case study we could evaluate the performance of our developed scoring method. In conclusion, this newly developed scoring method successfully detects collective anomalies even with data collected over one week, compared to the machine learning models which need more data to train themselves.

Unsupervised methods in anomaly detection do not rely on labeled anomaly data during training.Instead, they learn patterns and structures inherent in the data to detect anomalies based on deviations from the learned normal behavior.Some state-of-the-art approaches for unsupervised anomaly detection include Robust KDE (RKDE) 9 , Local Outlier Factor (LOF) 10 , mixture models (EGMM) 11 , one-class SVM (OCSVM) 12 , and autoencoders 13 , Isolation Forest (IForest) 14 .However, unsupervised methods also may have limitations in certain cases, such as difficulty in identifying the types of anomalies or sensitivity to the data distribution.Most of the unsupervised methods rely on the distribution and need to have proper parameter tuning.Quintana and the team have used Automated Load profile Discord Identification(ALDI) 15 to identify outliers in energy load profiles.But this method identifies outliers by testing a hypothesis for significantly different time series distributions.In our method we do not rely on the distribution, but the temporal behavior of the time series.
Transitioning to the application of this framework in building science, previous studies have primarily concentrated on pinpointing abnormal data points in a time series.However, a single abnormal point may not necessarily signify a problem within a specific building location.For a more comprehensive understanding, we must look beyond single-sensor anomalies and instead concentrate on clusters of abnormal behaviors across all sensors on a building floor.When the same abnormal behavior at the same time is observed in multiple sensors, it suggests a higher likelihood of a building anomaly.This approach allows us to identify potential issues with the building's HVAC system, control system, external envelope, or other infrastructure problems.
In 2019, Wang and his team has proposed an anomaly detection framework for thermal comfort in buildings 16 .It is a stochastic-based, two-step anomaly detection framework that is based on occupants' votes.An outlier is automatically flagged when a vote is significantly different from other occupants' votes.In the building industry, Fault diagnosis and detection (FDD) is an application of anomaly detection that monitors building HVAC systems to identify faults 17 .Graph-based anomaly detection methods also has been introduced by some researchers 18,19 Another research work 20 was done to detect anomalies in indoor office space by predicting values using long short-term memory (LSTM).They have used IoT sensors to collect temperature and humidity data.
The majority of existing research [21][22][23][24] have considered identifying abnormal points of a single time series (intra-time series anomaly detection).But, these single abnormal points can be raised due to equipment failures within the data collection hardware itself, which may lead to a false positive or false negative.Lieu et al. 21have used energy data, and a data point, which has fallen outside the 95% confidence bounds, is considered as an abnormal consumption.By dividing univariate time series into subsequences, Debanjana and Harry 22 could classify a data point as an anomaly or normal point using one-class support vector machines (OC-SVM) in 2020.In Wei et al. 's paper 23 , an unsupervised temperature anomaly detection method is proposed to detect anomalies in real-time temperature time series.It sets dynamic thresholds based on the Smoothed Z-Score Algorithm.
The importance of our study is that it identifies abnormal time series compared to every time series that we use in a study (inter-time series anomalies) and proposes a novel method to identify collective anomalies which can occur due to environmental issues.Furthermore, this method operates without the need for a labeled large training dataset and does not depend on the distribution of time series data.Ultimately the locations of these sensors can be considered as the locations that indicate a potential issue within the HVAC system or building construction.The scoring method development considered the number of anomalies, the vertical distance between the average point and an abnormal point, and the DTW distance between the average time series and the given time series.In the end, we applied the developed scoring method in a school building, located in New York, USA, and a synthetic data set, and discussed the results.

Anomaly detection
An anomaly is something that deviates from what is standard or expected 25 .The anomaly detection problem for time series is usually formulated in a way that can identify outlier data points relative to some usual signal.Anomaly detection within the context of buildings has real-world implications.Anomalies in building IEQ data can be caused by either environmental issues or hardware issues.A collective anomaly could identify inefficient controls or HVAC systems or could point to occupant behaviors negatively impacting the energy use within that space 26 .Contextual anomalies (as defined by a greater variation from the average within that same space) could identify failing or poorly calibrated hardware, or failure points within a building's mechanical or construction environment.Figure 1 shows the difference between these contextual and collective anomalies.www.nature.com/scientificreports/There are many approaches to detecting an anomaly in time-series data 27 and a few common methods are presented in Table 1.Seasonal Trend Decomposition 28 is one of the classic methods, and it uses a threshold to identify anomalies.Isolation Forest 29 is based on the classification models, and it does not consider any distance measures.The forecasting method 30 predicts the next point and based on the identified pattern from the past data it tries to identify anomalies.All these three methods have different methods of identifying anomalies.Hence we applied these three anomaly detection methods and DTW based anomaly detection method, which is explained in the next section, to sample time series data and selected the best method.The best method should identify collective anomalies, as they represent environmental issues.

Anomaly detection method based on dynamic time warping
This method is based on the anomaly detection method, which was introduced by Diab and team 31 .It has a control time series and a data time series.Anomalies are identified based on the distance between points of the optimal path.The optimal path means the optimal match between the control series and the given time series.The basic steps of this algorithm are as follows: 1. Identify the optimal similarity path between two series (using DTW). 2. Calculate Euclidean distances between those points in the optimal path.3. Calculate median absolute deviation (MAD) of distances.4. Consider the points which are 3 times MAD away from the median as outliers.5. Three or more consecutive outliers are considered as anomalies.
The importance of this algorithm is, it considers three or more consecutive outliers as anomalies.It detects only the collective anomalies, which are caused by environmental issues.However, this algorithm identifies anomalies based on a control sequence, but in practical situations, we will not always be able to find control sequences.Also, this method detects anomalies within single time series.But in our study, we want to identify abnormal time series compared to the all time series (inter-time series anomaly detection).Then it helps to identify abnormal sensors and locations with issues.Due to these limitations, we did some alterations to the existing method in order to apply it in our research study.
Instead of the control sequence, we used the median time series, which was created by calculating the median values of all the time series at each time point.Also when calculating the MAD value, we calculated all the distance values, comparing the median time series with each time series.By calculating MAD in that way, we could consider all the time series and identify the anomalies.

Dynamic time warping
Dynamic Time Warping (DTW) 32 measures the distance between two arrays or time series.DTW is a method that calculates an optimal match between two given sequences.This method allows us to find the distance between sequences of different lengths.Let A and B be two sequences with length L A and L B respectively.a i and b j indicate the ith and jth observations of A and B respectively.Then the pairwise euclidean distances 33 can be calculated for each observation of A and B. It will yield the L A × L B distance matrix S. The cumulative distance matrix D is calculated as in Eq. (1).The matrix D captures the total cost of alignment between the a 1 , b 1 and a L A , b L B .A lower total cost shows a higher similarity between the two sequences.
In the above equation i = 1, . . ., L A , j = 1, . . ., L B and S ij = d(a i , b j ) .The final distance between A and B is the bottom corner value of the cumulative distance matrix D, which is D(L A , L B )

Median absolute deviation
The Median Absolute Deviation (MAD) is a robust measure of how a set of data is spread out.The variance and standard deviation are also measures of spread, but they are more affected by extremely high or extremely low values.Also, in practical situations, it is hard to get normally distributed data.Hence, the MAD is one statistic that we can use instead.It is less affected by outliers because outliers have a smaller effect on the median than they do on the mean.MAD is defined as follows, (1) Table 1.Anomaly detection methods, used in this study to compare with the DTW-based method.

Method Summary
Seasonal trend decomposition 28 It only uses residue data from the decomposition to identify anomalies.This algorithm calculates the deviation of residue and uses a threshold to identify anomalies Isolation Forest method 29 It randomly selects a feature from the given set of features and then randomly selects a split value between the max and min values of that feature to isolate the outliers Forecasting method 30 The forecasting method is based on the approach that generates a predicted value of the next point, by considering several points from the past Vol:.(1234567890 www.nature.com/scientificreports/Here X i represents ith observation and according to our study X means the list distances between each data point of the optimal path, created using each time series and the control time series.

Abnormal sensor detection method
In this section, we explain the method we developed to identify sensors that can be considered abnormal compared to a group of sensors positioned within a building.The locations where those abnormal sensors are located can be considered as problematic areas of the building.Algorithm 1 summarizes the process of our algorithm.
We developed a scoring method to identify abnormal sensors, and identifying the main parameters which help to detect differences between two-time series was important.Considering literature and observations we decided to use the following parameters to compare the difference between the control and the given time series: 1. Number of anomalies.(based on section "Anomaly detection method based on dynamic time warping").2. Median of vertical Euclidean distances between outlier points, and control time series.

DTW distance between time series (check the similarity of the patterns).
We calculated the above values for all the sensors and normalized using min-max normalization and bring all values into one scale, from 0 to 1.Not all these three parameters have the same level of importance when detecting the difference between control and other time series.Hence, we considered generating a weighted score as above in algorithm 1, step 2.4, and ranking the time series based on these scores.However, since there is no historical data, on which are labeled as abnormal or not, optimizing the weights in a way that the abnormal sensors get the highest score was another challenge.Hence to overcome this, we had to develop a method to scale the weights using a simulation study.

Simulation study
Since we have no information about the actual abnormal sensors, we developed a simulation study to generate a set of 'normal' time series, and an 'abnormal' by considering the control time series.The flow chart in Fig. 2 summarizes the simulation work.
First, we had to identify the time series model of the median time series in order to generate a set of normal time series.Here we considered auto-regressive model (AR), moving-average model (MA), Auto Regressive Moving Average (ARMA), and Auto-Regressive Integrated Moving Average (ARIMA) models which are expressed using the Eqs.( 3), ( 4), ( 5) and ( 6) respectively.In these equations, y t means y, response variable, measured at time t, and β , alpha values represent the coefficients of the AR model and MA model respectively.ε t expresses The Augmented Dickey-Fuller test 34 is used to check the stationarity of the time series.If the time series is stationary, we proceeded to AR, MA, or ARMA models and selected the best model among them using the lowest AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) values 35 .Otherwise we had to take the difference to convert to stationary series and consider the ARIMA model with the rest of the models.Partial Autocorrelation (PACF) plots and Autocorrelation (ACF) plots 36 were used to find the order of AR and MA respectively.The differencing order of the ARIMA model is based on the number of times we took the difference to make the series stationary.After identifying the model we simulate 10 time series from the selected model.The next step was generating an abnormal time series which showed unusual behavior compared to the rest of the time series.For that, we applied the AR(1) plus noise state-space model 37 to generate our abnormal time series and it gave a distinguishable time series as in Fig. 3.The AR(1) plus state space model can be explained as follows.
Here Eq. ( 7) is the observation equation, and Eq. ( 8) is the state equation.x t denotes the state at time t, the tran- sition matrix is [φ] , the observation matrix is [1], and the transition offset is c.The observation and transition noises, WN(0, σ 2 ) are indicated by ε t and w t respectively.The expectation of state is E[X t ] = c 1−φ .Now we have 11 time series: the first 10 time series are normal and the 11th series is abnormal.Using these 11 time series, we generated a median time series and calculated anomaly count, vertical distance, and DTW distance.Then we used the grid search optimization method 38 to optimize the weights so that the abnormal sensor has the highest score.Also, the difference between the lowest score and the maximum score should be maximized.Here the score shows the abnormality of the sensor out of 100.The following algorithm 2 explains the weight-optimizing process. (3)

Data collection and preparation
In order to test our algorithm, we used data from a school building in the USA and synthetic data which contains both collective and contextual anomalies.The school building is located in New York, USA.New York school building's data 39 were collected in 2018 from March to July (summer season), and 11 sensors were placed inside the building to collect data.Sensors were placed in different locations throughout the school building in a variety of different rooms, and collected temperature data over a period of time.
For the synthetic dataset, we used an existing time series dataset and recreated similar time series with contextual anomalies and one different time series with collective anomalies.This synthetic dataset will help us to evaluate whether our method is capable of detecting the most abnormal time series due to collective anomalies.The following two plots in Fig. 4 show the variations and trends of the temperature time series of the synthetic data and the school building data.
Based on Fig. 4, in synthetic data, we can clearly see some collective outliers in the 'orange' color time series.Other time series are similar to mean time series, but they show some contextual (single) outliers.Since our algorithm should identify anomalies dues to environmental issues, it should identify the 'orange' color time series as the most abnormal one.In New York building's we can see lots of abnormal points and sensor data with different patterns and higher variation.However, our method should be able to identify abnormal sensors in both scenarios, where the variation is high and low.

Current anomaly detection methods vs DTW based anomaly detection method
Popular anomaly detection methods mainly identify abnormal points, not an abnormal time window.In this study, we tested the anomaly detection ability of some of these popular methods: STL decomposition, Isolation Forest, Forecasting method, and DTW method.Plots in Fig. 5 show a sample temperature time series from a building and how each different method identifies anomalies of that time series.Temperature sharply decreases from 23 to 21 • C and increases back to 23 • C within a short period of time.Facility managers would like to iden- tify these type of time windows, which shows indoor environmental abnormal behaviors.
Based on the results in Fig. 5 we could see that only the DTW-based method identifies collective anomalies in a perfect way that we want to detect building anomalies.Hence we proceeded with the DTW-based method described previously.This method helped to identify abnormal time windows, not only the abnormal points, providing more applicable results in order to identify abnormal time series in buildings.

New York school building
In this section, we will discuss the results of the abnormal sensors detection process of the New York building.All the sensors are zone temperature thermostats, measuring temperature in a variety of room types.Room categories include gymnasiums, science wings, locker rooms, auditoriums, and more.As we explained in the methodology section, first we generated the median time series and considered it as the control time series.Figure 6 shows temperature vs time plots which contain all the sensors and the median time series of those sensor data.
Then we needed to identify the time series model which fits best for the median time series.First, we checked the stationarity of the time series using the Augmented Dickey-Fuller test.Based on the results the time series was a non-stationary time series (p = 0.0835; failed to reject the null hypothesis).Then we had to choose the ARIMA model instead of the AR, MA, and ARMA models.After the 1st order differencing, the p-value drops beyond the 0.05 threshold.Hence we could consider the order of differencing as 1.To fit the models, we had to identify the order of AR and MA models and the following PCAF and ACF plots in Fig. 7 helped us to identify those orders.www.nature.com/scientificreports/Based on the PACF, since there are strong correlations till lag = 1, we assigned 2 as the order of the AR model.In the ACF plot, there are lots of strong correlations and because of that, we used 1 as the order of the MA model, as it is better to consider a less complex model.Using the orders of AR, and MA processes and order of differencing we fitted the ARIMA model, ARIMA(1,1,1).Then we used the estimated parameters of the ARIMA model to simulate the time series.The fitted ARIMA model is: The original median time series had an increasing trend and using the differencing method we could detrend and make it stationary to fit the time series model.Once we detrended the time series we could compare the de-trended time series with the simulated time series.Figure 8 shows the original median, de-trended, and a simulated time series.
After that, we generated an abnormal time series and using the grid search, we explained in algorithm 2, we could optimize the weights of the score function ( ψ ).The final score function of ith time series of this building is as follows: The final score values and ranks of each sensor, based on the anomaly count, vertical distance and DTW distance compared to the real median time series, are shown in the following Table 2. Based on the table 'sensor 222238' can be considered as the most abnormal sensor while 'sensor 222365' is the least abnormal sensor.It can be clearly seen that 'sensor 222365' follows the median time series though it has one outlier point in the middle of the time series.Figure 9 shows the most abnormal and least abnormal sensors compared to the median time series.

Synthetic data set
This section explains the results of anomaly detection of synthetic data.As with the New York analysis, first, we had to take the median time series by considering the time series of all sensors.The time series of each sensor and median time series are shown in Fig. 10.
We checked the stationarity of the median time series which was found to be stationary (p-value = 0).The difference was taken in order to make the median time series stationary.Then we end up with ARMA (2,1) model as the best-fitted model.The fitted ARMA model is, Then we generated an abnormal time series and 10-time series using the selected model.Using those time series and grid search methods we found the weights for the score function of each ith time series as follows,  After that, we calculated the actual anomaly count, mean vertical distance from an anomaly to the respective median time series points, and DTW distance for each sensor.Scores were found for each sensor using the score function which is summarized in the Table 3.
Based on the scores, the 'sensor S5' can be considered as the most abnormal sensor while the 'sensor s3' is the least abnormal sensor.Figure 11 shows the difference between these two time series plots.The most abnormal sensor clearly stands out from the median time series.

Discussion and conclusion
This study was conducted in order to identify the locations with abnormal behaviors of indoor temperature, and we selected a school building and a synthetic dataset to test our method.Based on the results of those datasets, it is clearly shown that our algorithm using the weighted scoring system is capable of identifying sensors with abnormal behaviors.Sensors with low scores show a better alignment with the median time series while the sensors with higher scores show significant changes in the pattern of time series when compared with the median series.
Identifying abnormal performances of buildings is important for many parties in different ways.It helps building occupants improve their own comfort, while owners can save money by reducing energy waste and utility costs.Hence anomaly detection in buildings is an interesting research work, which is attracted by many building owners, tenants and researchers.
Our study offers significant advantages over existing models: it can effectively identify anomalies without the need for labeled training data, making it highly versatile even with a small dataset.Furthermore, this method does not depend on the distribution of time series data, enhancing its applicability.Despite its simplicity, it effectively identifies anomalies based on domain knowledge in the field of building science.
This anomaly detection process in buildings can help identify areas of concern that are difficult to find observationally or require a significant and costly energy audit to diagnose.These anomalies can be tied back to issues with a building's mechanical system, spaces that are not conditioned to their real occupancy use, inefficient control systems, inefficient building envelope, insulation or window issues, and beyond.Anomaly detection could also help identify thermostats or other data collection equipment that are poorly performing or miscalibrated and may be transmitting false data that has a negative impact on that space.
However, an additional benefit of this methodology is to be able to test a building on multiple occasions to verify long-term trends and any degradation or loss of indoor environmental quality over time.A pattern of increasing frequency or severity in anomaly detection can be a key indicator of failing or degrading building conditions.Preventative analytics and fault detection can help to identify the presence of these issues before they reach a failure point, which will allow facilities managers to proactively identify and resolve known issues.
Though we used the median time series as the control time series, building owners and operators can change it to a known control time series, if available within the building.For example, if the building is on a temperature setpoint schedule, instead of the median time series, they can easily change this model to identify sensors that do not follow the setpoint schedule.
Not all anomalies are ignorable, and sometimes we do not want to consider some series as abnormal series, though we have some degree of abnormality.Hence, for future work, we are planning to improve our algorithm in a way that can identify whether the anomaly is acceptable or ignorable automatically.In this study, we only considered univariate anomalies, but in the future, we would like to develop a multivariate abnormal sensor detection algorithm as it can help to identify the performance issues of the building, quicker and more accurately than a human could.

Figure 1 .
Figure 1.(a) Contextual anomalies: due to hardware issue, (b) collective anomalies: due to environmental issue.

Figure 2 .
Figure 2. Flow chart of the simulation study.

Figure 3 .Algorithm 2 .
Figure 3. Simulated time series using state space model; blue indicates normal time series and red indicates abnormal time series.

Figure 4 .
Figure 4. Temperature variation with time plots: (a) New York school building, (b) synthetic data.

Figure 5 .Figure 6 .
Figure 5. Results of anomaly detection methods: (a) STL decomposition method, (b) isolation forest method, (c) forecasting method, (d) DTW-based method.Anomaly points are shown in red color points.

Table 2 .
Result table of New York Building: the most abnormal sensor is highlighted in bold, the least abnormal sensor is highlighted in italics..